AITopics | mean teacher

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Neural Information Processing SystemsMar-17-2026, 14:38:09 GMT

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.98)

Add feedback

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Neural Information Processing SystemsFeb-16-2026, 19:07:52 GMT

Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. Notably, we improve Mask AR from 20.2 to 22.5 on L VIS, and from 17.0 to 26.0 on SA-1B.

artificial intelligence, image understanding, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Neural Information Processing SystemsNov-21-2025, 15:28:24 GMT

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.

better role model, semi-supervised deep learning result, temporal ensembling, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.98)

Add feedback

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen, Harri Valpola

Neural Information Processing SystemsNov-21-2025, 10:18:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

b9ecf4d84999a61783c360c3782e801e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 05:56:23 GMT

artificial intelligence, image understanding, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback

Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles

Toikkanen, Miika, Kim, June-Woo

arXiv.org Artificial IntelligenceMay-29-2025

Respiratory sound datasets are limited in size and quality, making high performance difficult to achieve. Ensemble models help but inevitably increase compute cost at inference time. Soft label training distills knowledge efficiently with extra cost only at training. In this study, we explore soft labels for respiratory sound classification as an architecture-agnostic approach to distill an ensemble of teacher models into a student model. We examine different variations of our approach and find that even a single teacher, identical to the student, considerably improves performance beyond its own capability, with optimal gains achieved using only a few teachers. We achieve the new state-of-the-art Score of 64.39 on ICHBI, surpassing the previous best by 0.85 and improving average Scores across architectures by more than 1.16. Our results highlight the effectiveness of knowledge distillation with soft labels for respiratory sound classification, regardless of size or architecture.

artificial intelligence, ensemble, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.22027

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A mean teacher algorithm for unlearning of language models

Klochkov, Yegor

arXiv.org Artificial IntelligenceApr-21-2025

One of the goals of language model unlearning is to reduce memorization of selected text instances while retaining the model's general abilities. Despite various proposed methods, reducing memorization of large datasets without noticeable degradation in model utility remains challenging. In this paper, we investigate the mean teacher algorithm (Tarvainen & Valpola, 2017), a simple proximal optimization method from continual learning literature that gradually modifies the teacher model. We show that the mean teacher can approximate a trajectory of a slow natural gradient descent (NGD), which inherently seeks low-curvature updates that are less likely to degrade the model utility. While slow NGD can suffer from vanishing gradients, we introduce a new unlearning loss called "negative log-unlikelihood" (NLUL) that avoids this problem. We show that the combination of mean teacher and NLUL improves some metrics on the MUSE benchmarks (Shi et al., 2024).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.13388

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Reviews: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Neural Information Processing SystemsOct-7-2024, 23:21:55 GMT

The paper proposes a new method for using unlabeled data in semi-supervised learning. The idea is to construct a teacher network from student network during training by using an exponentially decaying moving average of the weights of the student network, updating after each batch. This is inspired by previous work that uses a temporal ensemble of the softmax outputs, and aims to reduce the variance of the targets during training. Noise of various forms is added to both labelled and unlabeled examples, and a L2 penalty is added to encourage the student outputs to be consistent with the teachers. As the authors mention, this acts as a kind of soft adaptive label propagation mechanism. The advantage of their approach over temporal ensembling is that it can be used in the online setting.

better role model, mean teacher, semi-supervised deep learning result, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen, Harri Valpola

Neural Information Processing SystemsOct-3-2024, 13:30:05 GMT

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.

arxiv, prediction, temporal ensembling, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)

Add feedback

Table-Filling via Mean Teacher for Cross-domain Aspect Sentiment Triplet Extraction

Peng, Kun, Jiang, Lei, Li, Qian, Li, Haoran, Yu, Xiaoyan, Sun, Li, Sun, Shuo, Bi, Yanxian, Peng, Hao

arXiv.org Artificial IntelligenceJul-23-2024

Cross-domain Aspect Sentiment Triplet Extraction (ASTE) aims to extract fine-grained sentiment elements from target domain sentences by leveraging the knowledge acquired from the source domain. Due to the absence of labeled data in the target domain, recent studies tend to rely on pre-trained language models to generate large amounts of synthetic data for training purposes. However, these approaches entail additional computational costs associated with the generation process. Different from them, we discover a striking resemblance between table-filling methods in ASTE and two-stage Object Detection (OD) in computer vision, which inspires us to revisit the cross-domain ASTE task and approach it from an OD standpoint. This allows the model to benefit from the OD extraction paradigm and region-level alignment. Building upon this premise, we propose a novel method named \textbf{T}able-\textbf{F}illing via \textbf{M}ean \textbf{T}eacher (TFMT). Specifically, the table-filling methods encode the sentence into a 2D table to detect word relations, while TFMT treats the table as a feature map and utilizes a region consistency to enhance the quality of those generated pseudo labels. Additionally, considering the existence of the domain gap, a cross-domain consistency based on Maximum Mean Discrepancy is designed to alleviate domain shift problems. Our method achieves state-of-the-art performance with minimal parameters and computational costs, making it a strong baseline for cross-domain ASTE.

aspect sentiment triplet extraction, proceedings, table-filling, (12 more...)

arXiv.org Artificial Intelligence

2407.21052

Country:

North America > United States > Idaho > Ada County > Boise (0.05)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

mean teacher

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

b9ecf4d84999a61783c360c3782e801e-Paper-Conference.pdf

Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles

A mean teacher algorithm for unlearning of language models

Reviews: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Table-Filling via Mean Teacher for Cross-domain Aspect Sentiment Triplet Extraction